Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Pseudo relevance feedback method for dense retrieval
Wenhao HU, Jing LUO, Xinhui TU
Journal of Computer Applications    2023, 43 (4): 1036-1042.   DOI: 10.11772/j.issn.1001-9081.2022030480
Abstract292)   HTML14)    PDF (1463KB)(116)       Save

Pseudo Relevance Feedback (PRF) mechanism is an automated Query Expansion (QE) technology that uses the original query and the information contained in the top N documents in the initial retrieval to build more accurate queries. It can further improve the performance of retrieval systems. However, the existing PRF methods for dense retrieval have two problems: lack of semantic information due to text truncation, and high time complexity in retrieval stages. Aiming at these problems, an PRF method based on paragraph-level granularity and can be used in dense retrieval for long texts, namely Dense-PRD, was proposed. Firstly, the embeddings of relevant paragraphs from top N documents of the initial retrieval were obtained by semantic distance calculation. Secondly, the QE term embeddings were obtained by average polling of the relevant paragraph embeddings. Thirdly, new query embeddings were constructed by combining the original query embeddings and QE term embeddings according to their weights. Finally, the final retrieval results were obtained according to new query embeddings. In experiments of comparing Dense-PRF with baseline models on two classic long text test datasets of Robust04 and WT2G, compared to model RepBERT+BM25, Dense-PRF has the accuracy and Normalized Discounted Cumulative Gain (NDCG) index of the top 20 documents improved by 1.66, 1.32 percentage points and 2.30, 1.91 percentage points. Experimental results demonstrate that Dense-PRF can effectively alleviate the mismatches between queries and document vocabularies and improve the retrieval accuracy.

Table and Figures | Reference | Related Articles | Metrics